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Abstract 

To discuss the existence and uniqueness of proper scoring rules one 
needs to extend the associated entropy functions as sublinear functions 
to the conic hull of the prediction set. In some natural function spaces, 
such as the Lebesgue L^-spaces over the positive cones have empty 
interior. Entropy functions defined on such cones have only directional 
derivatives. Certain entropies may be further extended continuously to 
open cones in normed spaces containing signed densities. The extended 
densities are Gateaux differentiable except on a negligible set and have 
everywhere continuous subgradients due to the supporting hyperplane 
theorem. We introduce the necessary framework from analysis and 
algebra that allows us to give an affirmative answer to the titular 
question of the paper. As a result of this, we give a formal sense in 
which entropy functions have uniquely associated proper scoring rules. 

We illustrate our framework by studying the derivatives and subgra¬ 
dients of the following three prototypical entropies: Shannon entropy, 
Hyvarinen entropy, and quadratic entropy. 
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1 Introduction 

Proper scoring rules have attracted a lot of interest in recent years in dis¬ 
parate fields such as statistics, decision theory, machine learning, game the¬ 
ory, finance, meteorology, etc. They provide practical measures for assessing 
the accuracy and precision of probabilistic forecasts. In this paper, we build 
a general measure-theoretic framework for proper scoring rules that allows 
us to consider their existence and uniqueness as subgradients of sublinear 
functions. 

1.1 Definitions 

Let (n, A, fi) be a measure space and P be a convex set of probability densi¬ 
ties on n with respect to the measure //. A random variable X takes values 
in n with unknown true density p £ V. We refer to V and its elements as 
a prediction set and predictive densities for X, respectively. By C{V) we 
denote the set of all /i-measurable functions / : —)• M such that 



Jn 

for all p gV. We call the elements of C{V) V-integrable functions. 


A scoring rule S : V ^ ^{'P) assigns for each predictive density q £ V 
a P-integrable function S{q). The value of S'(g) at x G If is interpreted as 
a numerical score assigned to the outcome x. We take scoring rules to be 
positively orientated, that is, they are viewed as incentives which a forecaster 
wishes to maximise. It is customary to term S proper if the expected value 
of S at q, 



is maximised in q at the true density q = p, and strictly proper, if the true 
density is the only maximiser. 

Strictly proper scoring rules could be used as a bonus system under which 
truth-telling is the only optimal long-term strategy (Gneiting and Raftery, 
2007). For such an S, the optimal expected reward is the (negative) entropy 
induced by S, 


: P —^ M, ^{p)=p-S{p), 


(Parry et ah, 2012). In what follows, we refer to simply as the entropy 
function associated to S, as there is no danger of confusion between negative 
and positive entropy functions in the present context. The regret for quoting 
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q instead of the true density p is expressed by the function 
D : V X V ^ R, D{p, q) = p ■ S{p) — p ■ S{q), 

which in the statistics literature is also known as the divergence induced by 
S. In the present paper, we shall use the notions of entropy and divergence 
in a more general sense by replacing strict propriety with propriety. 

General overviews of proper scoring rules may be found in Gneiting and Katzfuss 
(2014); Gneiting and Raftery (2007) in connection to probabilistic forecast¬ 
ing, and also in Dawid and Musio (2014), where the emphasis is on statistical 
inference. Theoretical aspects of proper scoring rules are studied in Dawid 
(2007); Griinwald and Dawid (2004); Williamson (2014). Frongillo and Kash 
(2014) investigate proper scoring rules in connection with the elicitation of 
private information. The remaining references throughout the text provide 
links to more specific uses of scoring rules. 

1.2 Motivation and Scope of the Paper 

In this paper we adopt the theoretical framework of Hendrickson and Buehler 
(1971). This approach is characterised by exploiting a beautiful connec¬ 
tion with Euler’s homogeneous function theorem, which presupposes that 
we extend our quantities of interest as homogeneous functions to the conic 
hull of the prediction set. To that end, we introduce the prediction cone 
= {Xp I A > 0, p G P} and extend S and $ to as homogeneous 
functions of degrees zero and one, respectively. Any P-integrable function 
q* satisfying 

‘h(p) > p - q*, Vp G 

with equality for p = < 7 , is called a V-integrable subgradient of <I> at q. The 
subgradient is called strict if the above inequality is strict for all p G 
not positively collinear to q. Suppose that <I> has a subgradient S{q) G T(P) 
at each q G and the resulting map S : C{V) is homogeneous of 

degree zero. We call S a V-integrable subgradient of <I> on . We recall 
that a (strictly) convex homogeneous function of degree one is a (strictly) 
sublinear function. We may now state Hendrickson and Buehler’s classical 
result in a slightly more contemporary language. 

Theorem 1.1. Let V be a prediction set with respect to the measure space 
A scoring rule S : V~^ —)• C{V) is (strictly) proper if and only 
if there is a (strictly) sublinear function —)• M such that S is a 

subgradient of <I> on V^. 
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Theorem 1.1 provides us with a basic but insufficient theoretical frame¬ 
work to discuss the titular question of this paper. In support of this claim, 
in Example B.2 we show the existence of a sub linear function that has 
unique but non-'P-integrable subgradients at some points of its domain, 
while at other points it has multiple "P-integrable subgradients. The most 
important structure missing in Theorem 1.1 is the notion of interior of 
a convex domain, which lies at the intersection of geometry, algebra, and 
topology, and may have different incarnations depending on the context 
(Borwein and Vanderwerff, 2010; Rockafellar, 1972). For example, study¬ 
ing proper local scoring rules on discrete sample spaces, Dawid et al. (2012) 
apply Theorem 1.1 in a context where the prediction cone is the interior 
of the positive orthant in In this case, well-known results from convex 
analysis give necessary and sufficient conditions for an affirmative answer to 
our basic question. The real focus of our paper is thus the non-Euclidean 
case in the abstract measure-theoretic setting introduced above. 

In Proposition 2.4 and Example B.3, we show that at boundary points 
sublinear functions have either no subgradient, or infinitely many. There¬ 
fore, it is paramount to try to define entropy functions on interiors of 
positive cones. In infinite dimensions, however, this is not always possi¬ 
ble. Indeed, it is well-known that the positive cones in many natural func¬ 
tion spaces (such as the Lebesgue L^-spaces over M'^) have empty interiors 
(Borwein and Lewis, 1992) and are negligible sets in terms of Baire cate¬ 
gory. This calls for a more subtle approach to our problem in which we need 
to refine our notion of interior and boundary. Inspired by geometric func¬ 
tional analysis, we adapt an algebraic refinement of the notion of interior of 
convex sets, whose better known topological analogues are often referred to 
as quasi-interior (Borwein and Lewis, 1992; Fullerton and Braunschweiger, 
1963). Common entropies whose domains are positive cones with empty in¬ 
terior but nonempty quasi-interior are the Shannon entropy, the Hyvdrinen 
entropy, and in principle, the entropies associated with the proper local scor¬ 
ing rules of arbitrary orders. These entropies are formally not differentiable 
functions but possess directional derivatives on large subspaces, which dis¬ 
play similar properties to standard gradients. 

Other entropies, such as those that are associated with the families of 
power scoring rules and pseudospherical scoring rules may be extended con¬ 
tinuously to open cones in normed spaces that contain signed densities. 
Geometrically, this setting is similar to the Euclidean setting. One applies 
the supporting hyperplane theorem and other standard results in analy¬ 
sis relating subgradients and Gateaux derivatives. The latter entropies are 
Gateaux differentiable (either everywhere or outside a negligible set), which 
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we illustrate in the context of the quadratic scoring rule. 

The original part of the paper is concerned with the analysis of the 
notion of "P-integrable subgradient introduced by Hendrickson and Buehler 
(1971) and the associated most basic general framework for proper scoring 
rules. To address the question of existence and uniqueness of proper scoring 
rules, we equip this framework with a notion of algebraic quasi-interior. 
As an illustration, we show that the Hyvarinen scoring rule is the unique 
0 -homogeneous P-integrable subgradient of its entropy function on the (non¬ 
empty) quasi-interior of a suitable positive cone. 

The paper is organised as follows. In Section 2, we introduce the no¬ 
tation and present all the background facts. Section 3 contains our main 
results which formulate necessary and sufficient conditions for existence and 
uniqueness of subgradients of entropy functions. In Section 4, we illustrate 
the theory with applications to three prototypical entropy functions, namely, 
the Shannon, Hyvarinen, and quadratic entropy. These examples formalise 
the meaning with respect to which we may consider each entropy to have a 
uniquely associated proper scoring rule. We complete the main part of the 
paper in Section 5 with some closing remarks. The proofs of all formal asser¬ 
tions made in the text are given in Appendix A. In Appendix B, we present 
additional facts that illustrate various points made in the Introduction or 
later in the text. 

2 Notation and Preliminaries 

Let E, El, E 2 be sets of //-measurable functions on fl. For a G M, we use 
the notation 


aEi = {af \ f G El} 

Ei + E 2 = {f + g \ f G Ei,g G £’ 2 }. 


The (blunt) cone of E is the set E~^ = {A/ | A > 0, / G E}, while the pointed 
cone of E is the set E~^ U {0}. The convex hull of E, 


k k 




is the set of all convex combinations of elements of E. The conic hull of E, 
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is the set of all conic combinations of elements of E. By 

span^ = < 

[ i=i 

we denote the set of all linear combinations of elements of E, and we refer 
to it as the linear span of E. 

A set E is called convex if coE = E, a cone ii E = E~^ or = E^ U {0}, 
a convex cone E = coneE or E = coneE \ {0}, and a linear space if 
E = spani?. If E is convex, E~^ = conoE \ {0} is a convex cone. 

The epigraph of $ : —)• M is the set in span E xM. given by 

epi$ = {if,y) \feE,y€'R,y> ^>(/)}. 

The graph of <I> is the set {(/,<!>(/)) | / € £1}. 

A function <I> : i? —)■ M is called convex if its epigraph is a convex set. 
The definition implies that E is convex. Therefore, <I> is convex if, for any 
f,g^E and A G (0,1), <I> satisfies 

a>((l - A)/ + Xg) < (1 - A)$(/) + Ad>(5). 

If the inequality is strict for f ^ g, then is called strictly convex. 

A function : E^ —)■ M is said to be (positively) homogeneous of degree 
k, for /c G M, or (positively) k-homogeneous, if for every / G E~^ and every 
A > 0, it holds ^(A/) = A^<I>(/). A function $ : —)• M is said to be 
subadditive if <I> satisfies 

^{f + g) < d>(/) + d>(5) 

for all f,g G E, and strictly subadditive, if the above inequality is strict for 
f g. We need to modify slightly the latter definition in the case when 
$ : E^ —)• M is 1-homogeneous. Then we say that <I> is strictly subaddi¬ 
tive if the above inequality is strict whenever f,g £ E^ are not positively 
collinear. Functions that are 1-homogeneous and (strictly) subadditive are 
called (strictly) sublinear. It is easy to see that $ : E^ —)• M is (strictly) 
sublinear if and only if $ is (strictly) convex on E and 1-homogeneous on 
E+. 

Let V he a prediction set with respect to {XI, A, g.) and let E C spanP. 
By we denote the annihilator of E in C{V), that is, all / G C{V) such 
that 



p- / = 0 
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for all p ^ E. Clearly, E-^ is a linear subspace of C{V). In the case when 

E^ = {0}, we say that E has a trivial annihilator. 

By a direction in a vector space we understand the equivalence class 
of all positively collinear vectors to a given nonzero vector. Note that any 
0-homogeneous function is a function of directions. For q € , we define 

the set of directions from q to the points in as 

'^{q) = {p ^ spanP I 3ep > 0, Vt G [0, Cp], q + tp £ 

= {p G sp&nV I 3ep > 0, q + CpP G 

We have the latter identity due to the convexity of . 

A point q G is an algebraically interior point of if Piq) = spanP. 
The collection of all algebraically interior points of is called the algebraic 
interior of . In the case of a topological vector space, the topological inte¬ 
rior of a set is always contained in the algebraic interior of the set. Moreover, 
when the topological interior is not empty, the two notions coincide. If q is 
not algebraically interior for , that is, V{q) / spanP, we say that q is a 
boundary point for . If has empty algebraic interior, then the predic¬ 
tion cone consists entirely of boundary points. This case occurs frequently 
in the context of continuous sample spaces, see e.g. Proposition B.l. 

Lemma 2.1. For each q G , we have the representation 

V{q) = cone(P’'‘ — q). 


For a point q G , we define 0{q) = V{q) n —V{q). This is the subset 
of directions in P{q) whose inverse is also in P{q). The set may be identified 
with these directions in spanP along which there is an open line segment 
that contains q and is contained in . Clearly, q is algebraically interior for 
if and only if 0{q) = P(q) = spanP. By construction, 0{q) is a linear 
subspace of spanP. The sets of directions P{q) and 0{q) are instrumental 
for defining various notions of directional derivatives. 

The most basic directional derivative is the following one. 

Definition 2.2. For a function <1> : —)• K, the right directional deriva¬ 

tive o/ $ at q £ along p £ V{q) is defined as 




<^{q + tp)-^{q) 
t^0+ t 


( 1 ) 


if the limit exists. 


We gather below the main properties of q). 
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Proposition 2.3. Let M be a sublinear function and q G V'^. 

We have 

(a) for each p G 'L>{q), 


t ^ f ^(9 + tp) - ^(9) ^ 

•!.>,,) = mf-^-€ 

and the infimum is finite for p G 0{q); 

(b) $+(•, q) : L^iq) —M U {— 00 } is sublinear; 

(c) for each A > 0, ^'+{p,Xq) = ^+{p,q); 

(d) for each p G , 


.U {- 00 }, 


$(p) > $+(p,g), 


with equality for p = q; 

(e) for eachpG 0{q), -^>'+(-p,g) < ^'+{p,q); 

(f) the set 

0'{q) = {p G 0{q) I - ^>+(-p, q) = ^'+{p, 9 )} 

is a linear subspace of 0{q) and the restriction <!>'_)_(•, is linear. 

We next consider the other two types of directional derivatives. First, 
if we take the limit ( 1 ) with the restriction t < 0 instead t > 0 , we obtain 
the left directional derivative of <h, denoted $'_(•, ( 7 ). It is easy to see that 
‘k^_(-,( 7 ) can be defined on 0{q) and we have ^'-{p,q) = —<!>'_)_(—p,g), for 
each p £ 0{q). Thus part (e) above can be rewritten as 

^'_{p,q) < ^+{p,q) 

for all p G 0{q). On the subspace 0'{q) introduced above in part (f), we 
have that 

^'_{-.q) = ^'^{;q) 

is in fact the two-sided directional derivative of $ at g, denoted <!>'(•, g). 
The latter can be defined as the limit (1) without any restriction on t. 
In the most important case in practice, we have that 0{q) = 0'{q). If 
in addition 0{q) 7 ^ spanP, then has no standard functional derivative. 
For an illustration of this fact in the context of Shannon and Hyvarinen 
entropies, see Section 4. 
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By LinP we denote the space of all real-valued linear functionals on 
spanP, i.e., the algebraic dual of spanP. By we denote the bilinear 
pairing on spanP x LinP, so if g G spanP and q* G LinP, q- q* is the value 
of q* at q. 

Let —?■ M be 1-homogeneous. We say that q* G LinP is a 

subgradient of ^ at g if 

$(p) > p ■ q* 

for all p G , with equality for p = q. The collection of all subgradients 
of at (7 is called the subdifferential of at (7 and is denoted by d^{q). A 
subgradient q* is strict if and only if the inequality d>(p) > p ■ q* holds for 
all p G not positively collinear with q. 

If /i G LinP, the hyperplane H in spanP x M given by 

z = p ■ h, Vp G spanP, 

supports d> at q if the epigraph of d> lies above H, and H contains the point 
{q, ^{q))- Clearly, H supports <!> at o' if and only if /i G d^{q). 

The following proposition describes the intimate connection between one¬ 
sided and two-sided directional derivatives and the subdifferential of a sub- 
linear function. 

Proposition 2.4. For a point q G , we have 

(a) q* G d^{q) if and only if 

p-q* < ^'+{p,q) 

for all p G , with equality for p = q; 

(b) if'D{q) = spanP and ^'{■,q) exists on spanP, then d^{q) = {$'(•, o')}; 

(c) ifV{q) = spanP and does not exist on spanP, then d^{q) has 

multiple elements; 

(d) if'D{q) / spanP and ^'^{p,q) is finite for all p G , then d^{q) has 
multiple elements; 

(e) if'D{q) spanP and there is p G such that ^'^{p,q) = — 00 , then 
d^{q) = 0 . 

Part (a) above is the standard characterisation of the subdifferential 
of a sublinear function. Parts (b) and (c) give additional information in 
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the case of algebraically interior points. Parts (d) and (e) do the same 
for boundary points. Notice that the latter imply the statement from the 
Introduction that at boundary points either the existence or uniqueness of 
subgradient fails. (See also Example B.3.) In the next section, we show that 
uniqueness might be sometimes recovered at certain boundary points if we 
confine ourselves to a regularity class such as 

We next give a formal definition of a scoring rule and elaborate some of 
its implications. 

Definition 2.5. Let V he a prediction set with respect to the measure space 
Any 0-homogeneous map S : —)■ /1(P) is called a scoring rule. 

If X is a random variable on LI with unknown true density p € V, then 
for each predictive density q G V'^, S{q){X) is a random function of X. The 
condition S{q) G C{V) guarantees that the expectation of S is always finite. 
The uncertainty funetion associated to S is the function <I> : —>■ M, 

^{p) = P • S{p). Clearly, <I) is 1-homogeneous. When S is proper, it is 
customary to call $ an entropy function. 

Suppose now that S : —)• CifP) is a proper scoring rule with entropy 

The condition that the expected score of S is maximised in q at the true 
density q = p means that S satisfies the inequality 

$(p) >P-S{q), 

for each p,q G V~^, with equality for q = p. If 5 is strictly proper, then p is 
the only maximiser up to a scaling factor. In this case, the inequality above 
is strict for any q that is not positively collinear to p. So, the assumption 
of propriety is equivalent to S being a subgradient of <I> on . Moreover, 
strict propriety corresponds to strict subgradients on The existence of a 
subgradient on implies that ‘h is sublinear, see Lemma A.l. We conclude 
that (strictly) proper scoring rules are "P-integrable subgradients of (strictly) 
sublinear functions. Therefore, it is reasonable in the context of scoring rules 
to restrict the notion of subgradient to the class C{V) C Lin(P). In the next 
section, and in particular in Theorem 3.1 and Theorem 3.2, we discuss the 
existence and uniqueness of P-integrable subgradients. 

In some special cases, we may add to our notion of subgradient a topo¬ 
logical structure. Let be a prediction cone such that spanP may be 
identified with a normed space {N, H-H), and let the continuous dual of N, 
denoted N*, be a subset of C{V). Suppose that V'^ C C, where C is an open 
convex cone in N, and may be extended to C as a continuous sublinear 
function. 
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We recall that is Gateaux differentiable at g G C if there is q* G N* 
such that for every p G N, the limit 


p ■ q* = lim 
t^o 


^{q + tp) - ^{q) 
t 


exists. The functional q* is called the Gateaux derivative of d* at g and is 
also denoted by V<l>(q). Notice that by definition the Gateaux derivative is 
applicable only to interior points. See Theorems 3.3 and 3.4 for an answer 
to our two main questions. 

If <I> is Gateaux differentiable at q, taking p = q in the above limit, we 
recover Euler’s homogeneous function theorem 

q-V^{q) = <^{q). 


More generally, if is sublinear and has a subgradient S on then we 
have that q ■ S{q) = ^{q), for every q G , (Hendrickson and Buehler, 
1971). The proof also follows from Proposition 2.4 (a) and Proposition 2.3 
(d). This beautiful generalisation of Euler’s theorem is only visible after 
extending S and to denormalised densities as homogeneous functions. 
Suppose now that a scoring rule S : P —)• £(P) is given. Then, setting 

for any q G extends S' as a 0-homogeneous function to the prediction 
cone. Here 

q ■ 1 = / q{x)dp{x) 

Jn 

is the normalising constant of q. Similarly, let an entropy function : P —)■ M 
be given. Setting 

for any q G extends as a 1-homogeneous function to the prediction 
cone. See Section 4 for an illustration. Working directly with denormalised 
predictive densities could also be advantageous in numerical computation 
(Dawid and Musio, 2012, 2014; Hyvarinen, 2005, 2007). 


3 Main Results 

Our first result gives a necessary and sufficient condition for existence of a 
P-integrable subgradient at a point. The result can be easily generalised to 
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subgradients on 

Theorem 3.1. Let —>• M 6e a sublinear function. Then d* has a 

V-integrable subgradient at a point q G V~^ if and only if there is q* G C{V) 
such that 

p-q* < ^'+{p,q) 

for all p G , with equality for p = q. 

In the light of Theorem 1.1 and the above result, we call any sublinear 
function $ an entropy if has a P-integrable subgradient at each point of its 
domain. In most cases of practical interest, one may choose the prediction 
cone appropriately so that = q* for some q* G C{V). This means 

that is a P-integrable subgradient of ^ at g and that = 

is also a two-sided directional derivative on the subspace 0{q) of 
spanP. In our next result, we show that if 0{q) is a sufficiently large 
subspace, then (•,<?) is the unique P-integrable subgradient of at q. 

Theorem 3.2. Let V be a prediction set and <I> : V'^ —>■ M 6e a sublinear 
function. Suppose that at a point q G V~^ the subspace 0{q) o/spanP has a 
trivial annihilator in C{V). If there is a q* € T{V) such that 

P-q* = ^'+{p,q) (2) 

for all p G , then q* is the unique V-integrable subgradient of $ at q. 

In the above result, the condition that 0{q) has a trivial annihila¬ 
tor in C{V) can be interpreted to say that the set of directions at which 
q G V'^ is boundary to the cone V'^ is negligible. The latter condition 
represents an algebraic analogue to the property of q being a quasi-interior 
point of V'^, which is better known in its topological forms presented in 
Borwein and Lewis (1992); Fullerton and Braunschweiger (1963). The col¬ 
lection of all quasi-interior points of V'^ is the quasi-interior of . As an 
illustration, in the next section we define Shannon and Hyvarinen entropies 
on positive cones with nonempty quasi-interiors. Presently, however, we do 
not investigate the proposed variant of quasi-interior in full. This analysis 
is not necessary for the application of Theorem 3.2 and may be a subject of 
future work. Notice also that uniqueness of subgradient is understood and 
valid only within the class C{V). 
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We now consider the case of topological subgradients. Our main assump¬ 
tion is the following: 

{ V~^ C C, where C is an open convex cone in a normed space N 
$ : C —)• M is a continuous sublinear function. 

Theorem 3.3. If (3) holds, then admits a subgradient S : C ^ N*. 

The result is generally known as the supporting hyperplane theorem. 

For proof see e.g. Borwein and Vanderwerff (2010); Niculescu and Persson 
(2006); Rudin (1973); Zalinescu (2002). Any subgradient S : C ^ N* of ^ 
may be identified with a proper scoring rule on P’*' by restricting S to . 

Theorem 3.4. Assume (3). Then, $ is Gateaux differentiable on C if and 
only if ^ admits a unique subgradient S :C ^ N*. In this case 5 = Vd* is 
the Gateaux derivative of^. 

This is a standard result in convex analysis. See e.g. Borwein and Vanderwerff 
(2010); Zalinescu (2002). See Example B.2 for an illustration of the case 
where the assumption N* C C{V) is not satisfied. 

4 Applications 

In this section, we apply our main results to three important entropies: 
Shannon entropy, Hyvdrinen entropy, and quadratic entropy. For each en¬ 
tropy, we investigate an appropriate domain with nonempty quasi-interior 
for which we show the existence of a unique subgradient. 

4.1 Shannon Entropy 

The Shannon entropy function for densities on is given by 



where p(x) > 0 is assumed to be sufficiently regular. More facts about Shan¬ 


non entropy may be found e.g. in Brier (1950); Dawid (2007); Dawid et al. 
(2012); Parry et al. (2012). 

We first show that Shannon entropy may only be defined for nonnegative 
functions in a natural way. The kernel of is the function (j)(t) = tint for 
t > 0 and (/)(0) = 0. Clearly, f>{t) is strictly convex on t > 0 since, for t > 0, 
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= l/t > 0, and (j) is continuous at the endpoint t = Q. Notice that 
has a vertical tangent at t = 0 since <p'{t) = Int + 1. We conclude that (/>(t) 
cannot be extended as a convex function to t < 0. This furnishes our claim. 

The positive cone of L^{W^) comprises of all nonnegative functions in 
L^{W^) and is denoted by In Proposition B.l we give a direct proof 

that is a nowhere dense subset of Since the domain of 

Shannon entropy is a subset of it too is a nowhere dense set. 

We now proceed to find a suitable prediction set. For a > d + 1, we set 


V- 


^ = |p e (7(M) p{x) > 0 , dCi, 6*2 > 0 : 


Cl 


(1 + |x| 


< p{x) < 


Co 


(1 + |x|)'^+^ 


Notice that C{V) C Indeed, for any / e C{V) consider 


Ptix) = 


1 0 < |x| < t 

t < \x\. 

Since pt G the P-integrability of / implies that 


/ \ ci+i 

I TTR 


/ \fix) \ dx < oo 

J\x\<t 

for all t > 0 . 

Let us next see that for any q G V^, C>{q) has a trivial annihilator 
in C{V). Clearly, 0{q) contains all p G spanP that have faster or equal 
decay at inhnity compared to q. Suppose that / G 0{q)-^. Choosing an 
appropriate approximation of the identity, {pn}, Pn G 0{q), we get that 
/ *Pn{x) f{x) for every x in the Lebesgue set of /. Hence / = 0 a.e. on 
W^. We conclude that 0{q)-^ = {0}. 

After this preparation, we may now define ‘h rigorously as the map from 
to M given by (4). Strict convexity of follows from the strict convexity 
of t In t, for t > 0 , while its 1-homogeneity is trivial. Therefore, is strictly 
sublinear on . Let us compute the right directional derivative of 
For q G V~^ and p G T>(g), we set qt = q + tp. We have 


lim 

t->o+ 


^{q + tp) - ^{q) 
t 


d 


dt 


t=o 




+ q- 


p ■ In 


q 

q-1 



Pll\ 


q 9-1 
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Therefore, 

jRrf 9-1 

Clearly, the function 

is in C[V). Indeed, the claim follows from the fact that S{q) is continuous 
in X and grows logarithmically as \x\ —)• oo. In view of Theorem 3.2, S is 
the unique P-integrable subgradient of on since <h'_)_(p, g) = p ■ S{q) 
for every p,q & . The map is known as the logarithmic scoring rule. 

The uniqueness of the logarithmic scoring rule as a subgradient of Shan¬ 
non entropy is in no way an absolute fact. Using the Hahn-Banach theorem 
as illustrated in Example B.3 and the fact that consists entirely of 

boundary points, one may construct other subgradients of ^ that lie outside 
C{V). Moreover, if q lies on the quasi-boundary of (i.e. the points where 
the condition 0{q)-^ = {0} is violated), then uniqueness will fail even within 
C{V). 


4.2 Hyvarinen Entropy 

Hyvdrinen entropy for densities on is defined as 



ivp(^)r 

p(x) 


dx. 


( 5 ) 


Here V is the gradient on Hyvarinen and related entropies are considered 
e.g. in Dawid and Musio (2012); Ehm and Gneiting (2012); Forbes and Lauritzen 
(2014); Hyvarinen (2005, 2007); Parry et al. (2012); Sanchez-Moreno et ah 
( 2012 ). 

We first show that there is no natural way to extend Hyvarinen entropy 
to signed densities. For simplicity, we confine ourselves to the case d = 1. 
Suppose that p changes sign at some xq G M that has multiplicity one. The 
assumption is generic and it means that xq is not an inflection point of p. 

It follows that the above integral is divergent at xq. Indeed, the claim is a 
direct consequence of the asymptotic expansion of the term 


\p'ix)\^ 

P{x) 


-h 0{x - Xq) 

X — Xq 


near xq. On the other hand, if p has a zero of higher multiplicity at xq, one 
may check that the above asymptotics will be bounded and the integral will 
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be convergent in a neighbourhood of xq. Nevertheless, the example shows 
that cannot be generally defined for densities that change sign. 

We proceed to define a suitable domain for <1). Suppose that V~^ consists 
of all positive, twice continuously differentiable functions p{x) on that 
satisfy the bounds: 

(a) there are Ci > 0 and k > 0 such that 


Vp(x) 

L 

Ap{x) 

p{x) 


p{x) 


< Ci(l + |x|)^; 


(b) there is 6*2 > 0 such that 


|p(x)| < 


<^2 

(1 + \x\y+^+^^ ’ 


where A = + • • • + /dx‘^ is the Laplacian on In view of the 

above, we have the following limit 


lim 4 [ 
R^OO R J\y\=R 


( y'^q{y)\ 

V q{y) J 


p{y)dy = 0 


for any p,q & . Note that here 


yVq{y) = yi 


dqjy) 

dyi 


+ 


_ yddqjy) 

dyd 


( 6 ) 


denotes the scalar product of y and Vg(x) and the integral in ( 6 ) is a surface 
integral over the sphere centred at the origin of radius R. The class V is 
broad, e.g. it contains the Gaussians, and all positive continuous densities 
that have bounded first and second-order derivatives and decay at infinity 
sufficiently fast. Just like in Section 4.1, we have that C{V) C and 

that for any q G the annihilator of 0{q) in C{V) is trivial. In the light 
of Proposition B.l, is nowhere dense in as C 

We now formally define Hyvarinen entropy as the map from to ffi 
given in (5). Convexity of ‘h follows from the convexity of the function 


(j){t, ,..., drf) 



for t > 0, (fi,... ,frf) G 


while its 1-homogeneity is trivial. Hence, ‘h is sublinear. Let us compute its 
right directional derivative. 
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For q £ and p £ F’(g), we set qt = Q + tp. We have 


lim 

t^o+ 


$(g + tp) - ^{q) 


/jjd dt 


t=o 


\^qt{x)Y 

qt{x) 


dx 


, Vg(x) Vp{x) _ |Vg(x)|^ 
q{x) p{x) q^{x) 


p{x)dx. 


By integration by parts we get 

f (2Vq{x)Vp{x) |V( 7 (x)|^ 


'\x\<R 


|3:|<-R 


q{x) 


q'^{x) 


p{x) dx 


2Aq{x) |Vg(g:)r 
q{x) q^{x) 


p{x)dx + 


^ f ( y^q{y) 
R J\y\=R V q{v) 


p{y)dy. 


Letting i? —)• oo and using (6), we obtain 



2Aq{x) ^ |Vg(a;)|^ \ 
q{x) q^{x) I 


The assumptions on guarantee that 


p{x)dx. 


S{q){x) = 


2Aq{x) \Vq{x)f 
q{x) q^iyx) 


is P-integrable for every q £ . In view of Theorem 3.2, S{q) is the unique 

P-integrable subgradient of ‘h on . The map is known as the Hijvarinen 
scoring rule (Parry et ah, 2012). 

In fact, S{q) is a strict subgradient of on . This can be shown if 
we notice that the divergence induced by S has the representation 


p ■ S{p) - p ■ S{q) 


Vp(x) 

p{x) 


Vq{x) 

q{x) 


2 

p{x)dx. 


The latter identity can be proved by integration by parts. The divergence 
is zero if and only if 

V(lnp(x) — ln(7(x)) = 0. 

This is equivalent to p = Cq for some constant C > 0, i.e., p and q being 
positively collinear. This concludes the proof of the claim. 
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4.3 Quadratic Entropy 

Here we consider the quadratic entropy 

4>(g) = [ q^{x)dx, (7) 

where {Q.,A,pl) is a Lebesgue measure space with H C In what follows, 
we show that its Gateaux derivative is the quadratic scoring rule, also known 
as Brier score. The quadratic entropy is a member of the important family 
of power entropy functions. The corresponding power scoring rules have 
been studied in connection to robust estimation e.g. in Basu et al. (1998); 
Kanamori and Fujisawa (2014, 2015). 

We proceed to choose a suitable domain for In contrast to the previous 
two entropies we now introduce a topology. To that end, we begin with a 
description of some normed spaces. Let tc : H —)• [0, oo) be a measurable 
function which we call a weight. By LP{Q,w), for p > 1, we denote the 
Lebesgue space of functions on H whose p-th power is absolutely integrable 
with respect to the weight w{x). By IHIp^ we denote the corresponding 
weighted L^-norm. When w is identically equal to one we get the usual 
Lebesgue space and norm. In this case we drop w from our notation. We 
now set 

w{x) = (1 + Ixl)'^’*'^. 

Notice that Lp‘{U.,w) embeds continuously in L^(H). Indeed, for / G L^(H), 
we have 

f \f{x)\dx= f w~^^^{x)\f{x)\w^^‘^{x)dx 

Jo, J 

<( [ w~^{x)dx\ f f \f{x)\‘^w{x)dx 

\Jn J \Jn 

<C\\f\ku.^ 

where G > 0 is a constant. Clearly, L?‘{Q.,w) also embeds continuously in 
L^(H) and hence the same conclusion holds for L^(H, w) for all intermediate 
spaces LP(H) with 1 < p < 2. Hence, we have the inequality 

ll/llp<C^II/ll2 ,. 

for some hxed C > 0 and all p G [1,2]. 

We have that / G L‘^{Q,w) if and only if G L^(H). Clearly, 

the weight is needed only when H is unbounded as otherwise the weighted 
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and the ordinary L^-norms are equivalent. The continuous dual space of 
L?‘{fl,w) may be identified with the space Therefore, g G 

if and only if G LF‘{fX). Hence, the dual space L^(n, 

contains the constants and also the elements of Lp‘{fl,w). 

We now specify a prediction set V C L\{U.,w) with the following prop¬ 
erty: there are constants fei > 0 and k 2 > 0 such that 


h < < k2 


for all q ^V. Choose 0 < e < min(l,A:i). For p G let Bp{p) denote 

the open ball about p of radius p > 0. Choose <5 > 0 so small that for 
every p G ^^(O) we have ||p||^ < e and ||p|| 2 u, < e. Let q gV and consider 
r G Bs{q). It is easy to show that 


ki — e < \\r 


2,w <k2 + e 


for all r G Bs{q). Similarly, we also have 


1 — e<r-l<l-|-e 


for all r G Bs{q). Here we have used the fact that r = p + q, where q-l = 1 
and IIpIIj^ < e. We now set 


Co = V + ^^(O) = iJq^pBi{q). 


It follows that Co is convex as both V and Bs{0) are convex. Finally, let 
C = Cq be the cone of Co- Clearly, C is an open convex cone in w). 

We may now formally define $ as the map from C to M given by (7). 
We have that $ is strictly convex on Co as the kernel function (f>(t) = is 
strictly convex for f G M. Therefore, $ is strictly sublinear on C. It is not 
hard to see that <I> is also continuous on C. Theorem 3.3 implies that has 
a subgradient on C. The following computation shows that <h is Gateaux 
differentiable. Indeed, for q gC and p G L'^{Cl,w), we have 




We obtain that 
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is the Gateaux derivative of $ as clearly V<1>((7) £ In view of 

Theorem 3.4, S = V4>|p+ defines a strictly proper scoring rule on V^. We 
have that V<I) is the unique subgradient of quadratic entropy on the cone C, 
but as discussed before, by using the Hahn-Banach theorem one may show 
that uniqueness fails on when is unbounded. The rule S is known as 
the quadratic scoring rule. 

5 Conclusion 

We were originally motivated to understand the implications of the fact that 
Shannon and Hyviirinen entropies are only finite on domains with empty 
interiors. As no notion of functional derivative is applicable to these en¬ 
tropies, the question whether the logarithmic and Hyvarinen scoring rules 
are the unique subgradients of their respective entropy functions is not ob¬ 
vious. In contrast, the quadratic entropy may be continuously extended to 
signed densities, which allows us to interpret the quadratic scoring rule as 
the Gateaux derivative of its entropy. We realised that in order to answer 
the titular question of the paper, one must introduce additional structures 
to the basic measure-theoretic framework known in the literature of scoring 
rules (Hendrickson and Buehler, 1971). The most important new aspect is 
the notion of interior and its refinement (known as quasi-interior) in the 
context of domains with empty interior. Another crucially important idea is 
to use directional derivatives to describe the subdifferentials of entropy func¬ 
tions. Finally, our approach marks a shift in emphasis from proper scoring 
rules to a greater focus on entropy functions. 

A Proofs 

Lemma A.l. LetV be a prediction set and <I> : V~^ —)• M 6e a 1-homogeneous 
function. If <I> has a (strict) subgradient on V~^, then ^ is a (strictly) sub- 
linear function. 

Proof. Let S : —)■ LinP be a (strict) subgradient of <I>. Then S (strictly) 

satisfies 


<^{p) >p-S{{l - X)p + Xq) 
^>(g) > q ■ ^((l - X)p + Xq) 


for every p,q G {p and q not positively collinear), and every 0 < A < 1. 
Multiplying the first inequality by 1 — A, the second one by A, and then 
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adding them up, we obtain that (strictly) satisfies 

d>(l - A)p +Ag) < (1 - A)d>(p) + Ad>(g). □ 


Proof of Lemma 2 . 1 . We first show that cone(P'*' — q) <Z P{q). It is easy to 
see that Tl{q) is closed under taking conic combinations. The claim follows 
from the fact that —q) C V{q). We now show that V{q) C cone(P+ — g). 
If p G ^( 9 )) then there is Cp > 0 and r G such that q + CpP = r. Then 
p = {r — q)e~^ and hence p G cone('P“'‘ — q). □ 

Proof of Proposition 2 . 3 . (a) For p G P{q) arbitrary, consider the line in 
spanP with parametric equation 


l{t) = q + t{p-q), t£R, 


passing through q and p. Clearly, 7(0) = q and 7(1) = p. Moreover, there 
is some e > 0 such that the interval [0, e] is mapped entirely in under 7 
(if p G 'P”*', then e > 1). Then the function 

=<^{q + t{p-q)), fG[0,e], 


is convex and its slope function 




4>{t2) - 
t2 — tl 


ti,t 2 G [ 0 , e]. 


is nondecreasing (Niculescu and Persson, 2006; Rockafellar, 1972). We have 


that 




lim 

i 2 ^ 0 + 


4>{t2) - 0 ( 0 ) 

t2 


inf 

t2>0 


f’ih) - <(>(0) 

t2 


If p G 0{q), then there is some <5 > 0 such that the interval [—<5,(5] is 
mapped entirely in under 7. Let —6 < ti < 0 < t2 < 6. To prove that 
<!>'_)_ (p,g) is finite, we consider 


(/)( 0 ) - (fih) ^ (j){t2) - (/>( 0 ) 

—tl t2 

and take the inhmum in t 2 . 

(b) Homogeneity of <!>()_(•, 5 ) follows from: 


d>+(Ap, q) 


d>(g + TAp) -$(g) ^ ^ ^(g + Arp) - $(g) 

r->o+ r “ T^’0+ At 

= Ad>+(p,g). 
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Let pi,P 2 S ^(<?)- Subadditivity of follows from: 


^+{pi +P2,q) 


^{q + T{pi +P2)) - ^(g) 
r^ 0 + r 


< lim 

r—>■0+ 


^(g /2 + rpi) -$(g )/2 
r 


+ lim 

r—>-0+ 


^{q/2 + TP 2 ) - ^{q)/2 
T 


^(g + 2 rpi) - ^(g) 
r^o+ 2 r 


+ lim 

T->-0+ 


$(g + 2tp 2) - ^{q) 
2t 


= ‘^'+iPi,q) + ^+iP2,q)- 


(c) The claim follows from 


^'+iP, Ag) 


$(Ag + Tp) - $(Ag) 

r^0+ T 

^'+{p,q)- 


^{q + rp/\) - $(g) 
T^>o+ r/A 


(d) We have 

d>(p) > ^q+p) - Hq) > + 


where 0 < r < 1. The first inequality follows from sublinearity of while 
the second and third follow from the fact that the slope function of <h is 
nondecreasing. It remains to show that <h(g) = ^'_^{q,q). This follows 
immediately from 


J,(,) = lim (1 + ")■“(<?) - ■“b) ^ lim + 

T^0+ T T^0+ T 

= ^'+{q,q)- 


(e) The claim is a direct consequence of 


0 = $+ (0, q) = {p -p,q)< $+ (p, q) + $+ (-p, q) ■ 

(f) To show that 0'{q) is a linear subspace of 0{q) it is enough to show 
that it is closed under scalar multiplication and vector addition. Let A G M 
and p G 0'{q). Then, for A > 0, <!>'_)_(Ap, 5 ) = X^'_^_{p,q). Analogously, for 
A < 0 we have 


^V(AP>9) = ^+(-A(-p),g) = -A$+(-p,g) = A(-$+(-p,g)) = A$+(p,g). 

Therefore, $'+(Ap, g) = A$'+(p, < 7 ) for any A G M and p G 0'{q). Then 
multiplying by A both sides of the identity 
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and using the previous identity, we get that Xp € 0'{q). Hence, 0'{q) is 
closed under scalar multiplication. 

Suppose now that p,r G 0'{q). We have 

^'+{p + r,q) < +$+(r,g) = -(^>'+(-p,g) + ^>'+(-r,g)) 

< -^\{-p-r,q) < ^'^{p + r,q), 

where the last inequality follows from (e). Clearly, we must have equalities 
throughout. In particular, 

“ C 9 ) = ^'+iP + r, q) 

and 


^'+{P + r,q) = ^'+{p, q) + ^'+(c Q)- 

Hence p + r G 0'{q). We conclude that 0'{q) is a linear subspace and 
is linear. □ 

Proof of Proposition 2.f. (a) The sufficient part of the claim follows from 
Proposition 2.3 (d). Let us now show the necessary part. To that end, let 
(f G LinP be a subgradient of at g, and let p G be arbitrary. Setting 
qt = q + {1 — t)p, we have $(<?*) > qt ■ q* for all t G [0,1]. Subtracting ^{q) 
from both sides of the inequality and dividing by (1 — t), for t G (0,1), we 
get 

<^{q+{l-t)p)-<f>{q) 

- —t - ^p-1- 

Letting 11 1) we get 

^'+{p^q) >p-q* 

as desired. 

(b) The claim follows by restricting $ to 1-dimensional affine spaces 
through q. On these spaces <I> is convex and differentiable and therefore has 
a unique subgradient. Since these subspaces cover the whole of spanP, it 
follows that the directional derivative $'(•, < 7 ) is the unique subgradient of 
there. 

(c) In view of Proposition 2.3 (a), <I*+(p, q) is finite for each p G 0{q) = 
spanP. The hypothesis implies that there is at least one 1-dimensional 
linear subspace of spanP on which $Y(-, g) is not linear. There are inhnitely 
many ways we can choose a linear function on that space that is dominated 
by $^(-,g). The claim now follows from the Hahn-Banach theorem stated 
below as Theorem B.4. 
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(d) Since 0{q) / spanP, it follows that \ 0{q) is nonempty. Take 
any p in that set and consider the 1 -dimensional linear space generated by 
the span of p. Since $+(•, q) is dehned only on its positive half-space, there 
are infinitely many linear functions that are dominated by $^(-,( 7 ) on the 
whole space. The proof now follows from Theorem B.4. 

(e) There is no element of LinP that satisfies the condition in part (a) 

of this proposition. Therefore, d^{q) =0. □ 

Proof of Theorem 3.1. Suppose that q* G C.i'P) satisfies p • q* < g) 

for all p G with equality for p = q. In view of Proposition 2.3 (d), we 
have that p ■ q* < ^(p) for all p G and q ■ q* = ^{q). Hence, q* is a 
P-integrable subgradient of at q. 

The converse claim, that is, if q* is a P-integrable subgradient of $ at 
q, then p- q* < 4>Y(p, q) for all p G V'^, with equality for p = q, follows from 
Proposition 2.4 (a). □ 

Proof of Theorem 3.2. The hypothesis implies that <!>'_)_(•, g) is linear ou.O{q) C 
. By restricting to 1-dimensional subspaces of 0{q) it follows immedi¬ 
ately that any subgradient of $ must agree with q* on 0{q). The assumption 
that 0{q)-^ = { 0 } implies that may have at most one P-integrable sub¬ 
gradient at q. Then the claim follows from the fact that q* is a subgradient 
of at g. □ 

B Some Additional Facts 

The positive cones in many standard function spaces are nowhere dense sets. 
Let us show this for the Lebesgue space The positive cone of 

consists of all Lebesgue integrable functions / > 0 a.e. on and is denoted 
by We recall that a set in a topological vector space is nowhere 

dense if its closure has empty interior. 

Proposition B.l. The positive cone of is nowhere dense. 

Proof. We show that for every / > 0 a.e., there is g > 0 a.e. such that, 
for every a > Q, f — ag ^ L^(n). This means that no open ball about / 
is contained in L^(]R'^). Since L^(]R'^) is closed, then this would imply that 
L^(]R'^) is nowhere dense. 

To prove our claim, we use the fact that there is no absolutely convergent 
series with a slowest rate of decay at infinity. We begin by partitioning 
into dyadic regions 

oJk = {2^ < |x| < 2^+1} 
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for A; € Z. For / S we set 


We have that the series 


= / f{x)dx. 

J UJk 



f{x)dx 


is absolutely convergent. If = J2i>k series for each 

k, then the series is also convergent (Rudin, 1976). Notice 

that the ratio of the common term of the second to the first series tends to 
infinity as k ^ oo. Therefore, the second series has a strictly slower rate of 
convergence. There exists a function g G such that the integrals of 

g on Wfc are bk = ak/^/n and 


OO 




g{x)dx. 


Clearly, for any a > 0, the difference f — ag changes sign for some uik, and 
hence f — ag ^ □ 


The next example illustrates the notion of topological subgradient in the 
case when the assumption N* C C{V) is not satisfied. 

Example B.2. Consider a Lebesgue measure space (12,^,/r) with a com¬ 
pact subset of We set to be the positive cone of C{Q), that is, the 
set of all nonnegative continuous functions on fl. The continuous dual of 
C(n) is the space of all real-valued Radon measures on fl. The fact that 
contains constants implies that C{V) C Actually, C{V) = 

and hence the P-integrable functions are the Radon measures that have a 
Lebesgue density. Since L^{Q) C ((7(0))*, we see that in this case the notion 
of a "P-integrable subgradient is more restrictive than that of a topological 
subgradient. 

We proceed to examine the implications of the latter observation on a 
concrete sublinear function. Let <I> : (7(0) —>• M be the supremum function, 
that is, 

$(p) = supp(a;). 

It is easy to check that is non-strictly sublinear and continuous. The 
supporting hyperplane theorem guarantees the existence of a topological 
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subgradient of $ at each point in its domain that is a real Radon measure. 
Let us see whether the subgradient is regular enough to be a proper scoring 
rule. 

We first demonstrate that there are points q S at which <1> has no 
subgradient in C{V). To that end, let M.{q) denote the set of modes of q, 
that is, the subset of ft where q reaches its maximum. Notice that Ai{q) is 
always compact. It can be shown that 

^'+{p,q)= sup p{x), 

x£M{q) 

the proof of which is left to the reader. When M{q) = {a:o} is a singleton, 
= d{x — xq) is Dirac’s delta function. Clearly, in this case <I> is 
Gateaux differentiable with derivative 5{x — xq). We claim that has no 
P-integrable subgradient for any density q with n{Ai{q)) = 0. 

Suppose conversely that q* G T(P), q* / 0, is a subgradient of $ at q. 
Then 

>p-q* 

for all p € . We shall show that this inequality implies q*{x) < 0 a.e. on 

D, which leads to a contradiction with <I*((z) = q ■ q* >0. 

To show the latter claim, notice that il.\ Ai{q) is open, and hence for 
any y G D \ Ad(g), there is > 0 such that the ball about y of radius ey 
lies in the complement of M.{q) with respect to Q. Let {pk} be a sequence 
of densities approximating 6{x — y) entirely supported on this ball. Since 
^'+{Pk,q) = 0, we get that Pk ■ q* < 0. If y is a Lebesgue point of q*, then 
we have the limit 

lim Pk-q* = S{- -y)-q* = q*{y). 
k^oo 

Since almost every point of q* is a Lebesgue point, we get that q*{x) < 0 
a.e. on D. This completes the proof of the claim. 

In the case p{Ai{q)) > 0, we may find a P-integrable subgradient of <I> 
at q. Consider the function 

\o x€n\M{q). 

Clearly, q ■ q* = sup^.^^ q{x) and p ■ q* < sup 3 ,gQp(a:) for all p G V~^ . This 
furnishes our claim. 

In our final example, we illustrate the fact that at boundary points a 
sublinear function has either no subgradient, or infinitely many. 
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Example B.3. Take ^{x^y) = x + y on = {(x,y) | x > 0, y > 0}. The 
graph of is a part of a plane, so it is easy to see that has infinitely many 
supporting planes at the boundaries of . Consider now 

X y 

$(x, y) = X In — ^-h y In —^— 

x+y x+y 

on which is Shannon entropy for binary variables. A computation shows 
that 

V<h(x, y) = In — - -h In —-— 

x+y x+y 

and hence V<h(x,y) —)• —oo when (x,y) tends to the boundary of This 
means that has vertical tangent planes through the coordinate axes, which 
implies that ^ has no subgradient on the boundary of its domain. 

The situation is the same when is a subset of an infinite dimensional 
vector space. For example, one may use the Hahn-Banach theorem presented 
below to show the existence of multiple supporting hyperplanes at boundary 
points q for which q) is finite for all p € . If, instead, there is p € P'*' 

for which ^'^{p, q) = —oo, then $ has no subgradient at q. 


We now state a slight generalisation of the classical Hahn-Banach theo¬ 
rem. Let FI be a real vector space and K <Z E he a convex cone. 

Theorem B.4 (Hahn-Banach theorem). Let : AT —^ M 6e a sublinear 
function and Iq : Eq be a linear functional on a linear subspaee Eq Q E 
which is dominated by (f on EqP K, i.e. 

loiq)<f{q), yqeEoHK. 

Then there exists a linear extension I : E ^ W of Iq to the whole spaee E 

such that 

i{q) = Hq), Vg e Eq, 

l{q)<(l>{q), VgGAnA. 

In the classical formulation of the theorem, we have K = E. The proof 
of the version with K G E is the same. In fact, if anything, the condition 
K C E is easier to satisfy than K = E when extending Iq. 
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